14.8 Predictive Sparse Decomposition¶
Training Proceeds by minimizing
\[||x - g(x)||^2 + \lambda|h|_1 + \gamma||h - f(x)||^2\]
During training, h is controlled by the optimization algorithm. The training alternates between minimization with respect to h and minimization respect to the model parameters.
The interactive optimization is used only during training. The parametric encoder f is used to compute the learned feature when the model is deployed. PSD models maybe stacked and used to initialize a deep network to be trained with another criterion.